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Abstract 

How to quantify the distance between any two partitions of a finite set 
is an important issue in statistical classification, whenever different clus¬ 
tering results need to be compared. Developing from the traditional Ham¬ 
ming distance between subsets or cardinality of their symmetric difference, 
this work considers alternative metric distances between partitions. With 
one exception, all of them obtain as minimum-weight paths in the undi¬ 
rected graph corresponding to the Hasse diagram of the partition lattice. 
Firstly, by focusing on the atoms of the lattice, one well-known partition 
distance is recognized to be in fact the analog of the Hamming distance 
between subsets, with weights on edges of the Hasse diagram determined 
through the number of atoms in the unique maximal join-decomposition 
of partitions. Secondly, another partition distance known as “variation of 
information” is seen to correspond to a minimum-weight path with edge 
weights determined by the entropy of partitions. These two distances are 
next compared in terms of their upper and lower bounds over all pairs 
of partitions that are complements of one another. What emerges is that 
the two distances share the same minimizers and maximizers, while a 
much rawer behavior is observed for the partition distance which does 
not correspond to a minimum-weight path. The idea of measuring the 
distance between partitions by means of minimum-weight paths in the 
Hasse diagram is further explored by considering alternative symmetric 
and order-preserving/inverting partition functions (such as the the rank, 
in the simplest case) for assigning weights to edges. What matters most, 
in such a general setting, turns out to be whether the weighting function 
is supermodular or else submodular, as this makes any minimum-weight 
path visit the meet or else the join of the two partitions, depending on 
order preserving/inverting. Finally, two appendices are devoted respec¬ 
tively to a definition of Euclidean distance between fuzzy partitions and 
the consensus partition (combinatorial optimization) problem. 

Keywords: partition lattice, symmetric function, Hamming distance, 
Hasse diagram, geodesic distance, indicator function, graph of a polytope. 
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1 Introduction 

Partitions are key instruments in many applicative scenarios at the interface of 
computer science, artificial intelligence and engineering, including pattern recog¬ 
nition, data mining and bioinformatics, while also being “of central importance 
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in the study of symmetric functions, a class of functions that pervades mathe¬ 
matics in general ” HD P- 39] (see also [ID Chapter 5], HD and (SHI Chapter 
7] on symmetric function theory). Below, symmetric functions are employed 
to define metric distances between partitions, which in turn are useful when 
different clustering results need to be compared. In statistical classification, 
partitions of a data set may indeed be referred to as “clusterings”, although the 
latter term relates to a richer set of structures than the former. The issue ad¬ 
dressed here typically arises since a local search clustering algorithm generally 
provides different outputs when initialized with different candidate solutions (or 
inputs). On the other hand, a chosen clustering algorithm shall allow for differ¬ 
ent parametrizations, each yielding different results for the same data. Finally, 
alternative clustering algorithms commonly partition the same set in alterna¬ 
tive ways. In all these cases, a distance measure is essential for assessing the 
proximity between diverse partitions HD El ESI- 

The issue is attracting considerable attention since the mid 60s HnHH HD- 
More recently, since measuring the distance between partitions of a population 
is fundamental for sibling relationship reconstruction in bioinformatics, several 
contributions over the last decade adopted a combinatorial approach for study¬ 
ing one specific such a distance measure, here denoted by MMD as it relies on 
maximum matching HI m m m HH]. More precisely, MMD can be shown 
HKH [Tp to be computable via the assignment problem HD- Also, in most 
recent years sibship reconstruction has been tackled by means of a further par¬ 
tition distance measure [S] , obtained axiomatically from information theory |31j 
and called variation of information VI. 

In this work, entire families of metric distances between partitions are con¬ 
sidered, the principle aim being to have consistency and generalizations in terms 
of order (i.e. lattice) theory. In fact, the general leading idea is the same as 
in HHiEniEH], namely to define distances between elements that are (par¬ 
tially) comparable in terms of a binary order relation, although attention is 
not limited to posets (partially ordered sets), distributive lattices and semi¬ 
lattices, but mainly extends to the geometric lattice of partitions. Metrics for 
distributive lattices are usually defined in terms of valuations (or modular lat¬ 
tice functions, such as the rank or cardinality of subsets in the Boolean case, 
see below). Conversely, valuations of the partition lattice are constant func¬ 
tions ID, and therefore useless for defining metrics. Thus, the method proposed 
here relies on super/submodular lattice functions (referred to as lower/upper 
valuations in HI)- 

The first goal is to reproduce the traditional Hamming distance between two 
subsets, given by the number of atoms of the subset lattice included in either 
one but not in both (i.e. the cardinality of their symmetric difference, see 0)- 
Such a benchmark is extended to the geometric lattice of partitions by focusing 
on atoms and join-decompositions of lattice elements BED- While every subset 
admits a unique such a decomposition, involving a number of atoms equal to 
the cardinality (or rank) of the subset, a generic partition admits different join- 
decompositions, most of which redundant. The number of atoms involved in 
the unique maximal join-decomposition of a partition is here referred to as the 
size of that partition, yielding a function taking positive integer values, like 
the rank. In fact, the two coincide for subset lattices but differ crucially for 
partition lattices. Roughly speaking, replacing the rank with the size yields a 
(i.e. the) Hamming distance between partitions, denoted by HD. Apart from the 
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resulting consonance in terms of ordered structures, HD and VI share important 
characterizing axioms (see m)- Computable through scalar products between 
Boolean vectors, without any algorithmic issue, HD has a large range, and thus 
fine measurement sensitivity too. 

The traditional Hamming distance between two subsets of a n-set is also the 
length of a shortest path between them in the Hasse diagram of the Boolean 
lattice of subsets. Such a diagram is in fact the graph of the polytope [ZllIH] 
given by the n-dimensional unit hypercube [0,1]", thus it has 2” vertices that 
bijectively correspond to subsets, and an edge links any two vertices when the 
two corresponding subsets are comparable in terms of the covering relation 
(see [ille] and below). In order to have exactly the same for the Hamming 
distance between partitions, these latter must be seen to correspond bijectively 
to those graphs on n (labelled) vertices each of whose components is complete. 
More precisely, denoting hy = {N,N 2 ) the complete graph on vertex set 
N = {!,...,n}, with N 2 = {{i,j} : 1 < * < J < n}, partitions correspond 
bijectively to those graphs G = {N, E), E C N 2 each of whose components is a 
maximal complete subgraph or a clique, and the geometric lattice of partitions 
of N is the so-called polygon matroid defined on the edges of Kjv [H pp- 54, 259]. 
The associated Hasse diagram is thus recognized to be the graph of a polytope 
strictly included in the ( 2 )-dimensional unit hypercube [0, 1 ] ( 2 ), Specifically, the 

2 ( 2 ) -set {0, 11 ( 2 ) of hypercube vertices identifies the 2(2)-set of distinct graphs 
on vertex set N, whereas linear dependence [55] entails that partitions only 
span Bn < 2 ( 2 ) hypercube vertices, where is the Bell number of partitions 
of a n-set (n > 1) [21 dZl SI]. While the covering relation between subsets 
assigns a unit weight to every edge of the n-cube EZ], edges of the polytope 
of partitions must be weighted through the size, which matches precisely the 
number of edges of the ( 2 )-cube that collapse into a unique edge of the included 
polytope. With these weights, the Hamming distance HD between partitions 
(like between subsets) is the minimum weight of a path connecting them. 

The analysis then continues by observing that the size may be replaced 
with any alternative symmetric and (strictly) order-preserving/inverting parti¬ 
tion function, such as rank, entropy, logical entropy [141115] and co-size (see 
below). Then, polytope edges have weights obtained as the difference between 
the greater and the smaller value taken by the chosen function on the associated 
endpoints. Accordingly, the distance between two partitions remains the mini¬ 
mum weight of a path connecting them. In particular, if the function assigning 
weights to edges is order-preserving and supermodular (like the size) or else 
submodular (like the rank), then the minimum-weight path between any two 
partitions visits their meet or else their join, respectively. Analog results obtain 
for order-inverting and symmetric functions which are either supermodular or 
else submodular. 

Section 2 outlines the needed background, with emphasis on lattice func¬ 
tions and Hamming distances in general, while Section 3 introduces the pro¬ 
posed Hamming distance between partitions, including an axiomatic character¬ 
ization. Section 4 is devoted to bounding both the Hamming and variation-of- 
information distances over all pairs of partitions that are complements of one 
another. Section 5 frames distances as minimum-weight paths in the Hasse di¬ 
agram. Section 6 considers two further functions assigning weights to edges, 
namely logical entropy and co-size. Sections 7 and 8 are two appendices de- 
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tailing respectively a definition of Euclidean distance between fuzzy partitions 
and an exact solution for the consensus partition (combinatorial optimization) 
problem. Section 9 concludes the paper with some final remarks. 


2 Preliminaries 

Throughout this work, the general concern is with metric distances d(x, y) be¬ 
tween elements x,y € X of a poset {X, ^), i.e. a (finite) set X endowed with a 
partial order relation Additionally, X shall also be endowed with the meet 
A and join V operators, so that (A, A,V) is a (complete) lattice (see [11]). The 
ordered structures to be considered are grounded on a finite set N = {1,..., n}, 
where integers 1,..., n possibly denote the indices of a data set. In particular, 
attention is going to be placed on the Boolean lattice (2^, n, U) of subsets of N 
ordered by inclusion A and, mostly, on the geometric lattice , A, V) of parti¬ 
tions of N ordered by coarsening > (see [21 [51]). Generic subsets and partitions 
are denoted respectively hy A, B £ 2^ and P,Q £ . Recall that a partition 

P = {Ai,..., A|p|} is a collection of (non-empty) pair-wise disjoint subsets, 
called blocks, whose union is N. For any P,Q £ , ii P ^ Q, then every 

block B £ Q is included in some block A £ P, i.e. A A B. Hence the bottom 
partition is P± = {{1},..., {n}} (like the bottom subset is 0), while the top 
one is P^ = {A} (like N is the top subset). Also, among partitions the meet 
A is the coarsest-finer-than operator, while the join V is the finest-coarser-than 
operator. The number |7^^| = Bn of partitions of N is defined recursively by 
Bo ■■= 1 and = I]o<fc<n [21 [13111] on Bell numbers). 

For all ordered pairs {x,y) £ X xX of poset elements, the associated interval 
or segment is [cc, y] = {z : a: ^ z ^ y} C X, and y is said to cover x, denoted 
by y >* X, if [a;,y] = {a:,y}. The Hasse diagram of poset (A, ^) is the graph 
G = (A, A) whose vertices are elements x £ X and edges are given by the 
covering relation, i.e. E = {{x,y} : [a;,y] = {a;,y}}. Although these edges 
are sometimes assumed to be directed, thereby also indicating what elemets are 
covered/covering, still in the present setting they are more fuitfully regarded as 
undirected, for this allows to consider paths where edges may be used in both 
directions. In fact, the distance between any two vertices in a graph is the length 
of any shorthest path between them. More generally, if the graph is weighted, 
meaning that every edge has an associated (strictly positive) weight, then the 
distance between any two vertices is the weight of a lightest path between them, 
where the weight of a path is the sum over its edges of their weight. 

In a lattice (A, A, V) with bottom element x_l, the set A^ = {x ■. x >* x±} 
of atoms consists of all lattice elements that cover the bottom one. In atomic 
lattices, every element x £ X admits a decomposition a; = oi V - • -Va/c as a join of 
atoms oi,..., Ofc £ A^. Both the Boolean lattice (2^, n, U) of subsets of N and 
the geometric lattice (7^^,A,V) of partitions of N are atomic. For the former, 
atoms are the n singletons {i},i £ N. For the latter, atoms are the ( 2 ) partitions 
consisting of n — 1 blocks, out of which n — 2 are singletons while the remaining 
one is a pair. Most importantly, every subset A £ 2^ admits a unique join- 
decomposition, namely A = Conversely, partitions generally admit 

several join-decompositions. However, every partition P £ admits a unique 
maximal join-decomposition, which includes all atoms finer than P. In the 
sequel, a great deal of attention shall be placed on such a number of atoms finer 
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than any given partition, to be referred to as the size of partitions. 

2.1 Lattice functions 

In order to consider alternative weights over the edges of the Hasse diagram, it 

is necessary to deal with different lattice functions / : X —>• IR+. Firstly, from a 

I X I 

geometric perspective, / S is a point in a vector space. A well-known basis 

I 1 if X z/ 

of this vector space is {Cx : x S X}, where Cxiv) = “ 10 ^ ^ ^ y € X. 

y L) 11 X ^ y 

Thus, any / is a linear combination / = Cxy^{x) of basis elements, with 

coefficients {x)^x € X given by Mobius inversion p-f : X —> R, where this 
latter obeys the following recursion; (x) = f{x) — J2y<a: fo'' x G X, 

hence fJ.^x±) = fix±) and f{x) = J2y^x (v) all x G X (see pi 11511^ 1. 

A lattice function / is said to be: 

• strictly order-preserving if f(x) > f{y) for all x,y G X such that x > y, 

• strictly order-inverting if /(x) > f(y) for all x,y G X such that x <y, 

• supermodular if /(x y y) + /(x Ay) — /(x) — f{y) > 0 for all x,y G X, 

• submodular if /(x y y) + /(x Ay) — /(x) — f{y) < 0 for all x,y G X, 

• modular if /(x y y) + /(x Ay) — f{x) — f{y) = 0 for all x,y G X, 

• totally positive if (x) > 0 for all x G X. 

Observation: if / is totally positive, then it is supermodular. To see this, firstly 
note that if x and y are comparable, i.e. say x ^ y, then there is nothing to show 
as X Ay = y and x V y = y, and thus the inequality defining supermodularity is 
satisfied with equality. Apart from this trivial case, if x and y are uncomparable, 
i.e. X ^ y ^ X, then substituting the general Mobius inversion formula above, 
i.e. /(x) = i^-to the inequality defining supermodularity formula 

yields /(x V y) -P /(x Ay) - /(x) - /(y) = 

z^xVy z^xAy z^x z^y 

xAy<z^x\/y xAyKz^x xAy<.z^y 

= 0 

xAy<z^x\/y 

x^z^y 

where of course [x A y, x] H [x A y, y] = {x A y} by definition of meet. 

Further lattice functions to be considered are symmetric ones, i.e. those 
that are invariant under the action of the symmetric group S{N) consisting of 
all n! permutations tt : N ^ N (see [U p. 161]). Symmetric functions are 
generally very important in mathematics; for reasons of space only essential 
facts are here exposed, with focus on lattices (2^, n, U) and , A, V). For any 
A G 2^,7r G S{N), let ttA = {Tr~^{i) : i G A}, where j = TT~^{i) is the index 
mapped into the i-th position by tt. A set function v : 2^ —>■ R+ is symmetric 
if v{A) = v{ttA) for all A G 2^, tt G S{N). Thus, v is symmetric if v{A) = v{B) 
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for all A,B €2^ such that |A| = \B\. As for partitions, for every P G 
let = (cf ,... ,c^) € be the class or type of P (see [IS]), that is to say 

= l{B : B e P, |B| = fc}|,l <k <n. For all {Pi,..., P|p|} = P and 

TT G S{N), let ttP = (ttPi, ... 7rP|p|}. A partition function h : —>■ R+ is 

symmetric if h{P) = h{'KP) for all P G ,tt G S{N). Thus, h is symmetric if 
h{P) = h{Q) for all P,Qg such that . 

2.2 Hamming distance between subsets 

First of all recall that measures of the distance between elements of any (i.e. 
possibly non-ordered) set are referred to as “Hamming distances” when these 
elements are represented as arrays or matrices and the distance between two of 
them is the number of entries where their array or matrix representations differ. 
The issue introduced in Section 1, namely how to measure a distance d{P,Q) 
between any two partitions P,Qg , is firstly addressed in the following 
Section 3 by reproducing the traditional Hamming distance |AAP| between 
subsets A, B G 2^, where |AAP| = |AUP| — |AnA|. This distance measure 
can also be expressed as |AAP| = |A\P| + |P\A| = r{A UP) — r{A n P), 
where r : 2^ —>■ Z+ is the rank function, i.e. r{A) = |A| for all A G 2^. 
The essential combinatorial feature of |AAP| is that it counts how many atoms 
{*};* € A of Boolean lattice (2^,n, U) are included in either A or else P but 
not in both. Also, |AAP| is a Hamming distance since subsets A,Bg2^ are 
represented as Boolean n-vectors XA, Xb € {0,1}", with characteristic function 
XA ■ N ^ {0,1} defined by x^(z) = 1 if i G A and XA{i) = 0 if i G N\A = A'^, 
for all A G 2^. Thus, |AAP| = iXAii) — Xb(*))^ is precisely the number 

of entries where xa and xb differ la [6]. Evidently, characteristic functions 
X^,A G 2^ provide a bijection between the 2^-set of subsets A G 2^ and 
the vertices xa S {0,1}” of the n-dimensional unit hypercube [0, Ij”. In fact, 
the graph of this latter polytope 13 HI is the Hasse diagram of Boolean lattice 
(2^, n, U), the two sharing the same vertices and edges, and |AAP| is the length 
of a shortest path connecting vertices xa and xb- Clearly, a shortest path is 
also a minimum-weight path as long as each edge has unit weight, which is 
precisely what happens when edges are weighted by the rank. 

For any two points p,q G [0, Ij" in the unit n-cube, let {p,q) = J2i<i<nPidi 
denote their scalar product. Since xn S {O,!}" is the n-vector all of whose 
entries equal 1, for all A G 2-^^ it holds r(A) = |A| = {xa,Xn)- Three further 
expressions for the Hamming distance between subsets A, P G 2^ are |AAP| = 

= |A|-k |P| - 2|AnP| = {xa,Xn) + {xb,Xn)-‘2{xa,Xb) = (1) 

= {xa,Xn) + {xb,Xn)- 2{xAnB,XN) = (2) 

= 2|AUP|-|A|-|P| = 2[(xa,Xn) + {xb,Xn)-{xa,Xb)]-{xa,Xn)-{xb,Xn)- 
Furthermore, the following two observations are immediately checked. 

• r : 2^ -A (0,1, 2,..., n} is a strictly order-preserving, symmetric and 
modular lattice (i.e. set) function, and 

• I ■ A • I : 2^ X 2^ —>• (0,1, 2,..., n} is a metric: for all A, A', P G 2^, 

1. |AAP| = |PAA|, 
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2. I^ABj > 0, with equality if and only li A = B, 

3. |AAA'| + \A'/S.B\ > \AAB\, or triangle inequality. 

3 Partition distances 

In a (simple) graph G = {V,E) with vertex set V = {vi,... ,Vm} the edge 
set i? C V 2 = {{vi,Vj} '■ 1 < i < j < m} is included in the (™)-set of un¬ 
ordered pairs of vertices. As already mentioned, the complete graph on these 
m (labelled) vertices is = (I^, V 2 ), and the Hamming distance HD{P,Q) 
between partitions defined in the sequel reproduces |AAi3| while keeping into 
account that partitions of N correspond bijectively to those graphs with vertex 
set V = N whose components are each a complete subgraph [5]. 

The combinatorial analog of |AAi3| in terms of partitions P, Q, namely the 
number of atoms of , A, V) finer than either P or Q but not finer than both, 
exists in the literature Emin], but is commonly not recognised to be such an 
analog. Conversely, the name “Hamming distance between partitions” is often 
customarily maintained for a metric obtained by representing partitions P as 
Boolean matrices £ {0,1}"^", despite these latter correspond in fact to 
generic binary relations on N |331 p. 393]. Since partitions only correspond 
to equivalence relations, it is readily seen there are 2"’ — Bn binary relations 
which are not equivalence relations, yielding both conceptual and quantitative 
ambiguities (detailed below). In addition, the metric obtained by representing 
partitions P as matrices does not yield any shortest path between vertices of 
the Hasse diagram of partitions. In general, it seems desirable that the distance 
between elements of a ordered set (such as 2^ and V^) is measured in terms 
of the order relation, like |AAi?| is specified in terms of D. That is to say, in 
formal notation, \AAB\ = |{{i} : A D {i} % B}\ + |{{i} '■ A^ {i} Q B}\. 

There exist many partition distance measures available in the literature, |131 
Sections 10.2, 10.3, pp. 191-193], [531 Chapter 5] [1311301IM]- Towards a clear 
disambiguation between the so-called Hamming distance between (matrices rep¬ 
resenting) partitions EU [351 ES] mentioned above and what is proposed here, 
recall that a binary relation 7^ on A is a subset TZ C N x N oi ordered pairs 
{i,j) of elements i,j £ N (hence unordered pairs {i,j} satisfy {i,j} = {j, *}, 
while {i,j) ^ {j,i) for ordered ones). The collection of all such binary relations 
is a Boolean lattice (2^^^, n, U). If symmetry (i, j) £TZ^ (j, i) £ TZ and tran¬ 
sitivity (j, *0 & TZ ^ S TZ hold, then TZ is an equivalence relation, 

or a partition of N into equivalence classes: A-maximal subsets A £ 2^ such 
that (i, j), {j,i) S TZ for all i,j £ A are precisely its blocks. A binary relation TZ 
may be represented as a Boolean matrix £ {0,1}"^"- with entries = 1 
if (i,j) £ TZ and = 0 if (j,j) ^ TZ. Now let two equivalence relations 
TZ^, TZ^ have associated partitions P, Q and representing matrices . 

The distance d{TZ^,TZ^) between subsets TZ^,TZ^ £ 2^^^ can be computed 
as d(TZ^,TZQ) = {TZ^ATZ^] = U TZ^l - n TZ^]. This is the number of 
Is in matrix -|- modulo 2. While providing a distance 

between partitions P and Q, this is in fact the traditional Hamming distance 
between certain subsets £ 2^^^, while generic such subsets TZ £2^^^ 

correspond to partitions only in very special cases, as lattice {2^^^, n, U) con¬ 
tains 2" — Bn elements, or binary relations, that do not correspond to parti- 
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tions, or equivalence relations. The argument also applies when partitions are 
represented as Boolean n x n-matrices through the complement TZ of equiva¬ 
lence relations TZ, known as apartness relations in computer science [Ulin], i.e. 
TZ^ = (TV X N)\TZ^ (this is detailed below). 

The point is that in finite sets such as 2^,V^ and 2^^^ where there is 
no “natural” metric (like the Euclidean norm in R.™), the distance between 
elements x and y must be quantified, in some way, by the number of elements z 
between x and z, where “between” means that z must be comparable, in terms 
of the order relation, with x and/or y. To achieve this, in the present setting, 
consider that the partition lattice (7^^, A, V) is a matroid (see [2l[5T] and above). 
However regarded, it is necessarily embedded into a larger subset lattice, with 
which some elements are shared while some others are not. Apart from binary 
relations just described, a naive example comes from noticing that partitions 
P are collections of subsets, i.e. P G 2^ , and thus the distance between P 
and Q might be computed as the Hamming distance \PAQ\ between elements 
of subset lattice (2^ , H, U), i.e. the number of subsets A G 2^ that are blocks 
of either one but not both. Again, there are really many (i.e. 2^ — Bn) set 
systems (or collections 5 £ 2^ of subsets) that do not correspond to partitions. 
This feature is maintained even when P and Q are decomposed as joins of 
atoms, for they generally admit several such join-decompositions [H Chapter 
H]. Yet, when regarded from this perspective partition lattice (P^,A,V) is 
seen to be included in subset lattice (2^=, n, U), with the two sharing the same 
( 2 ) atoms. In fact, 2^^ is the minimal Boolean lattice including the partition 
lattice. Accordingly, the Hamming distance between partitions HD proposed 
below relies precisely on representing partitions as Boolean ( 2 )-vectors, although 

only Bn < 2 ( 2 ) distinct such vectors correspond to partitions. In particular, 
HD is the traditional Hamming distance |i?A£’'| between edge sets E, E' G 2^^ 
of graphs on vertex set N, with these latter corresponding to partitions only 
when in both graphs G = {N,E),G' = {N,E') each component is a complete 
subgraph. 

3.1 Hamming distance between partitions 

In combinatorial theory, both (2^, fl, U) and , A, V) are geometric lattices [H 
p. 54]. As such, they are atomic, meaning that every element is decomposable 
as a join of atoms (see above). The rank function r : —>■ Z+ of the partition 

lattice is r{P) = n — \P\, with height r(P^) = n — 1 and r{P±) = 0 for 
the top and bottom elements, respectively. As already outlined, atoms are 
immediately above P±, with rank 1, in the associated Hasse diagram [2TJ p. 889], 
where coarser partitions occupy upper levels. Thus, atoms are those partitions 
consisting of n — 1 blocks, namely n — 2 singletons and one pair. These ( 2 ) pairs 
{i,j} G ^2 are the same atoms as in Boolean lattice (2^^,n, U). Notationally, 
it is now convenient to let [ij] G be the atom where the unique 2-cardinal 
block is pair {*,/} G [ij] (this is denoted by -Kxy in [HI p. 150], where x,y are 
elements of the partitioned set while tt denotes the generic partition). 

In order to have a combinatorially congruhent reproduction of the Hamming 
distance between partitions, let = {[ij] : 1 < i < j < n} be the ( 2 )-set of 
atoms of the partition lattice, with isomorphism = N 2 . The analog of 


characteristic function XA is indicator function Ip : -A {0,1}, defined by 

Ipm) = { J Sf p J [Sjj for all P e [ij] e V^. 

In words, if pair {i,j} is included in some block A of P, i.e. {i, j} Q A G P, then 
partition P is coarser than atom [ij], and the corresponding position Ip([ij]) 
of indicator array Ip has entry 1. Otherwise, that position is 0. For the top 
partition P^ = {A^}, indicator function Ipr is the (”)-vector with all entries 

equal to 1. For the bottom partition P±, analogously Ip^^ G {0, l}^^) is the 
( 2 )-vector all of whose entries equal 0. The number s{P) = : [ij] < P}| 

of atoms finer than any partition P is |43) the size s : -A Z+ mentioned in 

Section 1, i.e. 


s(P) = 


E 

AeP 


1^1 

2 


E 

l<k<r. 


= {Ip, Ipr)- 


While the cardinality \A\ = {xa,Xn) of subsets takes every integer value be¬ 
tween 0 and n, the size s{P) = {Ip, Ipr) of partitions does not the same be¬ 
tween 0 and ( 2 ). Minimally, this is already observable for N = {1,2,3}, as 
there are S 3 = 5 partitions: the finest {{1},{2},{3}} and coarsest {1,2,3} 
ones, together with the ( 2 ) = 3 atoms [12] = {{1,2}, {3}}, [13] = {{1,3},{2}} 
and [23] = {{2,3},{1}}. Thus, there is no partition with size equal to 2, as 
[12] V [23] = [12] V [13] = [13] V [23] = {1, 2,3} = [12] V [13] V [23]. Available sizes 
of partitions of a n-set, for 1 < n < 7, are in Table 1 below. 


Table 1: Available sizes of partitions of a n-set, 1 <n <7. 


n 

{s{P) : P G (available sizes) 

1 

{0} 

2 

{0,1} 

3 

{0,1,3} 

4 

{0,1,2,3,6} 

5 

{0,1,2,3,4,6,10} 

6 

{0,1,2,3,4,6,7,10,15} 

7 

{0,1,2,3,4,5,6,7,9,10,11,15,21} 


Both lattices and 2^^ are atomic, with every element P G and 
E G 2^2 admitting a decomposition as a join of atoms. Yet, while subsets 
E G 2^^ { or edge sets of graphs with vertex set N) admit a unique such a 
decomposition, namely E = j}, partitions generally admit several 

such decompositions P = [ij\i V • • • V [ij]k- For n = 3 as above, the coarsest 
partition {1,2,3} decomposes either as the join of any two atoms, or else as 
the join of all the three available atoms at once. In particular, the rank r{P) 
of P is the minimum number of atoms involved in a join-decomposition of 
P, while the size s{P) is the maximum number of atoms involved in such a 
decomposition. Hence, the coarsest partition {1,2,3} of a 3-cardinal set has 
rank r({l,2,3}) = 3 — 1 = 2 and size s({l,2,3}) = 3 = ( 2 ). 
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The rank r(P) of partitions is well-known to be strictly order-preserving, 
symmetric and submodular, while the size s(P) is strictly order-preserving, 
symmetric and supermodular. This is shown below. 

Lemma 1 The size is a strictly order-preserving partition function: if P > Q, 
then s(P) > s{Q), for all P,Q G . 

Proof: If P > Q) then every A G P is the union of some Pi,..., Pfc^ G Q, i.e. 
A = Pi U ■ • • U Pfc^, with > 1 for at least one A G P. The union P U P' of 
any P, P' G Q increases the size by 

which is strictly positive as blocks are non-empty. ■ 

In order to reproduce expressions (1-2) of Section 2.2 above, Hamming dis¬ 
tance HD between partitions has to count the number of atoms hner than either 
one of any two partitions but not finer than both. Thus, in terms of cardinalities 
of subsets of atoms, distance HD : x -G is given by 

HD{P, Q) = \{[ij] : P > [ij] ^ Q}\ -G |{[ij] : P ^ [ij] ^ Q}\. 

The size and the indicator function allow to obtain HD as follows: 

HD{P, Q) = s{P) + s{Q) - 2s{P AQ) = {Ipjpr) + {/q, /pr) - 2(/p, Iq). (3) 

Also note that P A Q = V \ij], and this is the maximal decomposition of 
PG[ij]GQ 

P AQ as a join of atoms, namely that involving s(P A Q) atoms. Therefore, 

HD(P,Q} = (Ip,Ipr) + ilQ,IpT) - 2 (Ip^q,Ipt). (4) 

In view of expressions (1-4), there seems to remain no doubt that, from a com¬ 
binatorial perspective, HD{P, Q) is in fact the faithful translation of the tradi¬ 
tional Hamming distance |AAP| from subsets A,B to partitions P, Q. 

3.2 Two further partition distances 

Two non-Hamming partition distances are now briefly introduced, since they 
provide a term of comparison for the following sections and also in view of the 
recent literature in bioinformatics cited in Section 1. Any subset A has a unique 
complement = N\A. For all partitions P and all non-empty subsets A ^ 0, 
let P^ = {B nA:PeP, 0^PnA} denote the partition of A induced by P. 
Maximum matching distance MAID{P,Q) between partitions P,Q is 

MMD{P,Q) = min{|A'=| C AC N,P^ = Q^}. (5) 

This is the minimum number of elements i G N that must be deleted in order 
for the two residual induced partitions to coincide. Also, MMD{P,Q) “is the 
minimum number of elements that must be moved between clusters of P so that 
the resulting partition equals Q ” [H p. 160]. It is computable as a maximum 
matching or assignment problem |12| . [251 chapter 11]. In a graph a matching is 
a set of pairwise disjoint edges, i.e. the endpoints are all different vertices. Now 
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consider the bipartite graph G = {PUQ, E) with |P| + \Q\ vertices, one for each 
block of each partition, and join any two of them A € P and B G Q with an edge 
{A, B} G Eii AnB ^ 0. In addition, let |^ni3| be the weight of the edge. Then, 
determining MMD(P,Q) amounts to find a maximum-weight matching E* in 
G, that is one where the sum J2{a B)eE* 1 ^ ^ weights is maximal. 

In fact, the minimum number MMD{P,Q) of elements that must be removed 
for the two residual partitions to coincide is the sum B)eE* over 

all selected edges of the cardinality of the symmetric difference between the 
associated endpoints. 

Another important measure of the distance between any two partitions P 
and Q is the variation of information VI{P,Q), obtained axiomatically from 
information theory (see [3n Expressions (15)-(22), pages 879-80]). Entropy 

e{P) = - Ea^p ^ log (^) = - Ei<.<„ of ^ log (I) of partitions P (binary 
logarithm) enables to measure the distance between P and Q as 

E/(P,Q) =2e(PAQ)-e(P)-e(Q), (6) 


Notice that while the range of MMD is {0,1,...,n — 1} C VI ranges 
in a finite subset of interval [0,logn] C M+. Most importantly, the entropy 
e{P) of partitions P is strictly order-inverting, symmetric and submodular, 
with e(Pj_) = log(n) and e(P^) = 0. To see submodularity, simply consider 
N = {1,2,3} as before, and set P = [12] and Q = [23], yielding P A Q = P± 
and P V Q = P^. Then, e{P \/ Q) + e{P AQ) — e{P) — e{Q) = 


= -ll«g(l)-3({log({))+2(|lo8(0 

= I-log(3) = 1.3 - 1.585 < 0. 

O 


■log 


Finally observe that —e(-), in turn, conversely is strictly order-preserving, sym¬ 
metric and supermodular. It can also be anticipated that VI is in the broad 
class of metric distances defined in the sequel, but MMD is not. 


3.3 HD and VI: axioms 

Following |31) . attention is now placed on those axioms that characterize both 
partition distance measures HD and VI. An alternative axiomatic characteriza¬ 
tion of HD appears in |34| . The following proposition may be compared with 
[5T1 pp. 880-881, Property 1]. 

Proposition 2 HD is a metric: for all P, P', Q G , 

1. HD{P,Q) = HD{Q,P), 

2. HD{P,Q) > 0, with equality if and only if P = Q, 

3. HD{P,P') + HD{P',Q)>HD{P,Q),i.e. triangle inequality. 

Proof: The first condition is obvious. In view of lemma 1 above, the second 
one is also immediate as min{s(P), s(Q)} > s(P A Q). In fact, HD{P,Q) is 
the sum [s(P) — s(P A Q)] -I- [s(Q) — s(P A Q)] of two positive integers, while 
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min HD{P,Q) = HD{P±, [ij\) = 1 = s([jj]) (for any atom [ij]). Concerning 
triangle inequality, difference HD{P, P') + HD{P', Q) — HD{P, Q) = 

= 2[s{P') - s{P A P') - s{P' AQ) + s(P A Q)] 

must be shown to be positive for all triplets P,P',Q G . For any P,Q & , 

size s{PAQ) is given, and thus s(P') —[s(PAP') + s(P'A(5)] has to be minimized 
by suitably choosing P'. Firstly, sum s(P A P') + s{P' A Q) is maximized when 
both P A P' = P (or P' ^ P) and P' A Q = Q (or P' ^ Q) hold. Secondly, if 
Q ^ P' ^ P, then the whole difference is minimized when P' = P y Q. Thus, 
HD satisfies triangle inequality as long as the size satisfies supermodularity: 
s(P y Q) — s{P) — s{Q) + s{P A Q) > 0 for all P,Q G P^. The simplest way 
to see that this is indeed the case is by focusing on Mdbius inversion of lattice 
(or more generally poset) functions (see SS] and above). By definition, the 
size s(-) has Mobius inversion /i® : P^ -y {0,1} given by /i®(P) = 1 if P is an 
atom (i.e. P = [ij\ G P^ or r(P) = 1), and fJ,^{P) = 0 otherwise. In fact, 
s(-P) = Sq^pM^(Q) fo'' all P G P^. The size thus satisfies a sufficient (but 
not necessary) condition for supermoduarity, in that its Mobius inversion takes 
only positive values (see Section 2.1). This completes the proof. ■ 

Triangle inequality is satisfied with equality by both HD and VI as long as 
P' = P AQ (for VI, see [STJ pp. 883, 888] Properties 6 and 10(A.2)). 

Proposition 3 HD satisfies horizontal collinearity: 

HD{P,P AQ) + HD{P AQ,Q) = HD{P,Q) for all P,Q GP^ . 

Proof: HD{P, P AQ)+HD{P AQ,Q) = [s(P) - s(P A Q)] + [s(Q) - s(P A Q)] 
as well as HD{P, Q) = s{P) + s(Q) — 2s(P AQ). m 

Briefly aticipating the forthcoming analysis, it may be noted that horizontal 
collinearity may well be conceived in terms of the join, rather than the meet, of 
any two partitions, since it is not hard to define distances d : P^ x P^ -y R+ 
satisfying triangle inequality with equality when P' = P y Q\ that is to say, 
d{P,P y Q) + d{P y Q,Q) = d{P,Q) for all P,Q G P^. This is in fact the 
so-called By “betweenness” relation proposed in [SS] p. 176]. 

Collinearity also applies to distances between partitions P, Q that are com¬ 
parable, i.e. either P ^ Q or Q ^ P. Firstly consider the case involving the top 
P^ and bottom P± elements (for VI, see (STJ p. 888] property lO(A.l)). 

Proposition 4 HD satisfies vertical collinearity: 

HD{P^,P) + HD{P,P^) = iJP(Pj_,pT) for all P G P^. 

Proof: HD{Pj_,P) + HD{P,P^) = s{P) + s{P^) - s{P) = s{P^) indepen¬ 
dently from P, as well as HD{P±, P^) = s(P^) = ( 2 )- ■ 

Vertical collinearity may be generalized for arbitrary comparable partitions 
P"^ > P > Q > Pj_, in that HD{Q,P') + HD{P',P) = HD{Q,P) for all 
P' G [Q,P], where [Q, P] = {P' : Q < P' < P} is an interval or segment [IS] 
of (P^,A,V) (see above). In fact, this is precisely the “interval betweenness” 
property considered in [3S1 P- 179] (for valuations of distributive lattices). 
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4 Distances between complementary partitions 

The distance between the bottom and top elements in vertical collinearity leads 
to regard such lattice elements as complements, thereby focusing on the dis¬ 
tance between other, generic complements. Maintaining the traditional Ham¬ 
ming distance between subsets as the fundamental benchmark, it must be taken 
into account that the subset and partition lattices are very different in terms of 
complementation. In particular, every subset A G 2^ has a unique complement 
and the distance between any two such complements equals the distance be¬ 
tween the bottom and top elements, i.e. = n = |AA0| for all A G 2^. 

Conversely, partitions P generally have several and quite different complements 
[2] , which are all those Q such that P A Q = P± as well a.s P V Q — P^. In 
statistical classification, partitions P, Q satisfying only the former condition, i.e. 
P A Q = P±, are commonly referred to as “dual partitions” and investigated 
as those where the addjusted Rand index ARI takes negative values; see 
[231 pp. 237-238, 389], [221 PP- 429-430] and [32]. Apart from this, concern¬ 
ing complementation and partition distances MMD, VI and HD, the former 
measures the distance between any two complements P, Q solely through their 
cardinalities jPj, \Q\, while VI and HD provide a fine distinction between differ¬ 
ent complements, and also agree on which are closer and which are remoter. The 
issue may be exemplified with V = {1,..., 7} and partitions P — 123|456l7 and 
P* = 147l2|3l5|6 and P* = Il2|34j5|67 (where vertical bar j separates blocks). 
Both P* and P* are complements of P, that is P A P* = P A P* = P_l and 
P V P* = P V P* = P^. Distances MMD, VI and HD are: 


VI{P,P.) 


MMD{P,P^) = 4 


6 log 6 — 2 
7 


1.93 


PP(P,P*) =8 


= i = MMD{P,P*), 

^ 1 nc 41og9-k 21og3 - 1 

< 1.95 ~- - - 

< 9 = HD{P,P*). 


VIiP,P*), 


Concerning MMD, this examples generalizes as follows. 
Proposition 5 For any two complementary partitions P,Q G P^ , 


MMD{P, Q) = max{r(P), r(Q)}. (7) 

Proof: If P A Q = Pj_, then every edge {A, B} G E C P x Q oi the bipartite 
graph G = {P U Q,E) defined in Section 2 above has unit weight 1 = JA n 
Pj. Hence, a maximum-weight matching simply is one including the maximum 
number of feasible edges. Such a number is min{jP'^j, JQ^]}, because 

each block (of either partition) can be the endpoint of at most one edge included 
in a matching. Also, the number of elements i G N that must be deleted for 
the two residual partitions to coincide is X]agpvq(I^I “ min{jP^j, JQ^]}). On 
the other hand, P V Q = P^ entails 

X! (1^1 =P-min{lPl,lQl} = max{r(P),r(Q)} 

AgPvQ 

as desired. ■ 

As shown by the above example, a partition generally has different com¬ 
plements with different classes. The set of complements of any partition P is 
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denoted by CO(P) = {Q : PAQ = P±,PVQ = P^}. A modular element of the 
partition lattice HIMlIil] is any P G where all blocks are singletons apart 
from only one, at most, i.e. J2i<k<n^kiP) < 1- The sublattice C 

consisting of modular elements contains the bottom and top elements, together 
with all partitions of the form {A} U Pf with 1 < |A| < n, where Pf is 
the finest partition of Hence there are 2” — n modular partitions (with 
P^od ~ n < 3). Here, the main link between modular elements and 

complementation is that an element is modular if and only if no two of its com¬ 
plements are comparable [JHl Theorem 1]. Therefore, if P ^ 'Pmod’ then there 
are Q,Q' G CO{P) such that Q > Q'. It seems thus important that the distance 
between P and Q differs from the distance between P and Q'. The following 
result bounds the Hamming distance HD between a partition and any of its 
complements. 

Proposition 6 For all P G P^, if Q G CO{P), then 

s{P) + \P\-1< HD{P, Q) < s{P) + (^1 2 ') . 

where the upper bound is always tight, while the lower one is tight only if 
ci(P) <2+ ^ (fc-2)cfc(P). 

l<fc<n 

Proof: Firstly note that if Q G CO{P), then HD{P, Q) = s{P) + s{Q). Hence, 

min{s((5) : Q G CO{P)} < HD{P, Q) — s{P) < max{s((5) : Q G CO{P)}. 

Any complement of partition P = {Ai ,..., A\p\\ has join-decompositions min¬ 
imally involving |P| — 1 atoms [ij]i, ■ ■ ■, [ij]\p\-i G P^, with associated pairs 
{i,j}m G N 2 such that \Am n {i,j}m\ = 1 = \Am+i n {i,j}m\A < m < \P\. 
Considering the upper bound first, observe that size V • • • V [fj]|p|_i) 

attains its maximum when \{i,j}m C {i,j}m+i\ = 1 for all 1 < to < |P| — 1, 
in which case V • • ■ V [ij]m) = for all 1 < to < |P|. This bound 

is tight because such a complement P* = [ij]i V ■ • • V [fj]|p|_i always exists, 
whatever the class c(P) of P. In fact, P* G P^^d has n — |P| -I- I blocks, out of 
which n — |P| are singletons, while the remaining one P G P* is |P|-cardinal and 
satisfies |P n A| = 1 for all A G P, i.e. P* = {B} U Pf°■ Thus s(P*) = 

Turning to the lower bound, observe that size V ■ • • V [fj]|p|_i) attains 

its minimum, ideally, when {i,j}m H {i,j}m' = 0 for all 1 < to < to' < |P|, 
in which case s([zj]i V • • • V [ij]m) = to for all 1 < to < |P|. Yet, this is not 
always possible because each block A G P can have non-empty intersection 
with a number of pair-wise disjoint pairs {i,j}m, 1 < to < |P| which is bounded 
above by |A|, entailing that the constraint is given by the number ci(P) of 
singletons {*} G P. Specifically, nesting together J2i<k<n^kiP) non-singleton 
blocks requires J2i<k<n ^k(P) — 1 pairs {i,j}m- If these latter have to be pair¬ 
wise disjoint, then the maximum number of elements j G N in non-singleton 
blocks available to match (into pair-wise disjoint pairs) those elements {f} G P 
in singletons is Ei<fe<„ fccfc(P) - 2 (Ei<fc<„ c/c(P) “ l) • ■ 

In words, if the number cf of singleton blocks of partition P exceeds the 
number 2 -|- X]i<fc<n(^ “ 2)cfe(P) of elements j G N available to match, into 
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pair-wise disjoint pairs, those elements {z} G P in singletons, then basically a 
complement Q of P, in order to yield the top partition through the join 
P W Q, must necessarily consist of blocks larger than pairs. Of course, in the 
limit, if the blocks of P = P± are all singletons, then the unique complement 
Q = P^ has to be the coarsest or top partition, consisting of a unique block. 
Thus, the greater the number cf of singleton blocks of P, the fewer and larger 
the blocks P G Q of a complement Q of P have to be. In this view, the following 
Proposition 6 shows that the more the cardinality \B\ of these blocks B a Q 
is evenly distributed, the lower the size s{Q) of the complement Q. In fact, 
on any level = {P : P G , \P\ = n — k},0 < k < n of the partition 
lattice, the size attains its maximum value on modular partitions (consisting of 
n — k — 1 singletons and one k + 1-cardinal block) and its minimum value on 
those partitions each of whose block has cardinality between and [, 

where [aj is the floor of a, i.e. the greatest integer < a, while [a] is the ceiling 
of a, i.e. the smallest integer > a, for a G M+. As detailed by Proposition 8 
next, the opposite occurs for the entropy of partitions. 


Proposition 7 //P G P" satisfies 2 -|- J2i<k<ni^ ~ ‘^)^k{P) < ci(P), then 


min s(P*) 
p.eCO(P) 


-k 


' L«(P)J 

n — 0{P) 


-k 1 - n 


2 


0{P). 


0(p) 

2 


where 6»(P) = 1 -k Ei<fc<„ Ck{P){k - 1). 

Proof: If 2-kX]i<fc<n(fc~2)cfc(P) < ci(P), then the above proof of Proposition 
5 entails that the maximum number max{|(3| : Q G CO{Py] of blocks of a 
complement of P is 9{P) := 1 -k X]i<fc<n Cfc(P)(fc — !)• On the other hand, for 
0 < m < n, among m-cardinal partitions Q of a n-set the size is minimized 
when |P| G { I — I , [—11 for all B € Q. Bound min s(P*) above is the size 

' ' ^ L m J I 771 I J p ^ ' 

of a 0(P)-cardinal partition P* with \B\ G 


/ 

n 

n 

\ 

[e(p)J • 

9{P) 


I for all B & P^. 

In particular, the number of -cardinal blocks is 0{P) ^ -k 1^ — n, 


while the number of 


e(p) 


-cardinal blocks is n — 9{P) . 


Proposition 8 Among complements Q G CO{P) of any P G , HD and VI 
have common minimizers, i.e. argmin HD{P,Q) = argmin VI{P,Q), and 

QecoiP) QecoiP) 

common maximizers, i.e. argmax HD{P,Q) = argmax VI{P,Q). 

Qeco(P) Qeco(P) 


Proof: Firstly, Q G CO{P) entails VI{P,Q) = 21ogn — e(P) — e(Q). Thus, 
VI{P,Q) is minimized or else maximized when e{Q) is, respectively, maxi¬ 
mized or else minimized. On the other hand, if P G Pmodi then all comple¬ 
ments Q G CO{P) have same rank. Otherwise, as already observed, there 
are comparable complements, i.e. with different rank. Therefore, in general, 
among complements Q G CO{P) entropy e(Q) is minimized when |(5| is mini¬ 
mized and, in addition, Q G This is precisely where size s(Q) is maxi¬ 

mized. Similarly, e{Q) is maximized when |(5| is maximized and, in addition, 




all B € Q. This is where s{Q) is minimized. ■ 
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5 Minimum-weight paths between partitions 

This section provides an analysis similar, in spirit, to that provided in |381 
Section 3], although the generic posets and semilattices considered there are 
replaced here with the geometric lattice of partitions. Similarly, the covering 
graph becomes the graph G of polytope Pat below, and despite posets lack the 
join and meet operators, still [38) defines upper/lower valuations, which corre¬ 
spond to sub/supermodular partition functions in the present setting. Apart 
from these differences, still the general idea to define metrics through weighted 
paths in the graph induced by the covering relation is the same. In fact, Ham¬ 
ming distance \EIS.E'\ between edge sets E, E' G 2^'^ is the length of a shortest 
path between vertices xe,Xe' & {0,1}^^) of the ( 2 )-dimensional unit hyper¬ 
cube where xb : A ^2 —t {0,1} is the characteristic function defined 

in Section 2, i.e. Xb({*, j}) = 1 if {i,j} G E and 0 otherwise. Recall that a 
polytope naturally defines a graph with its same vertices and edges m p- 93], 
and the hypercube is perhaps the main example of polytope. In particular, the 
graph of hypercube [0, l]^^) is the Hasse diagram of Boolean lattice (2-^2, D, U), 
for its edges correspond to the covering relation, that is to say {E,E'} is an 
edge of the hypercube if either E D E', \E\ = \E'\ -|- 1 or else the converse, i.e. 
E’ D E, \E'\ = \E\ + 1. 

Clearly, a shortest path is a minimum-weight path as long as every edge 
has unit weight. This simple observation is the starting point toward an analog 
view of the Hamming distance HD between partitions, namely as the weight 
of a minimum-weight path in the associated Hasse diagram when edge weights 
are determined by the size. More generally, if edge weights are determined 
by a symmetric and strictly order preserving/inverting partition function (like 
rank or entropy), then minimum-weight paths across edges of the Hasse dia¬ 
gram equivalently yield well-defined metric distances. In this view, consider 
the convex hull co.hu{{Ip : P G V^}) = Pat whose extreme points [T] [TS] are 
all the Bn Boolean ( 2 )-vectors defined by the indicator functions Ip,P G 
of partitions. Note that Pa/^ is a (so-called “hull onest”) 0/1-polytope that 
might be included in the classifying literature [BEZ] as a type in and of it¬ 
self. Here, it may be referred to as “the polytope of partitions”, since its graph 
G = ,E) basically is the Hasse diagram of partition lattice (T’^,A, V). 

Specifically, edges correspond to the covering relation between partitions, i.e. 
{P,Q} e E if either [Q,P] = {P, Q} or else [P,Q] = {P,Q} (see above). 
Let P > Q [Q,P] = {P,Q} denote the covering relation between parti¬ 
tions, while ea;(PAr) = {Ip ■ P G P^} is the set of extreme points or vertices 
of Ptv. For N = {1,2,3}, polytope Pat is strictly included in [0,1]^ and its 
five vertices are (0,0,0), (1,0,0), (0,1,0), (0,0,1) and (1,1,1). Thus, vertices 
(1,1,0), (1,0,1) and (0,1,1) of [0,1]^ are excluded from ea::(PAr), as they corre¬ 
spond to those 3 graphs with vertex set {1,2,3} whose edge set is 2-cardinal. 
That is, 2 ( 2 ) - Bn is precisely the number of graphs with vertex set N that 
do not coincide with their closure (see Sections 1 and 2). Geometrically, for 
N = {1,2,3}, polytope Pat is the union of a lower tetrahedron, whose volume 
is 0.16, and an upper tetrahedron, whose volume is 0.3, hence the whole vol¬ 
ume is 0.5. The former is co./iu((0,0, 0), (1, 0,0), (0,1, 0), (0, 0,1)), while the 
latter is co./ im(( 1, 0,0), (0,1, 0), (0, 0,1), (1,1,1)). Thus Pat is the polyhedron 
obtained as the intersection of 6 half-spaces, namely those three above the hy- 
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perplanes each including one of the three facets (different from unit simplex 
co.hu{{l, 0, 0), (0,1,0), (0,0,1))) of the lower tetrahedron, and those three below 
the hyperplanes each including one of the three facets (again different from the 
unit simplex) of the upper tetrahedron. Although this situation for N = {1, 2, 3} 
is quite simple, still for generic N = {1,... ,n} the associated polytope Pat is 
more complex. When n = 4, for example, Pat C [0,1]® is the convex hull of the 
15 vertices corresponding to the rows of Table 2 below, with columns indexed, 
from left to right, by the ( 2 ) = 6 atoms [12], [13], [14], [23], [24] and [34] of 
■ Corresponding partitions are in the far left column, with vertical bar j 
separating blocks. 


Table 2: Extreme points of 0/1-polytope Pat C [0, l]^^) for N = {1, 2, 3,4} 


P G ; [ij] G Pa ^ 

[12] 

[13] 

[14] 

[23] 

[24] 

[34] 

PT = 1|213|4 

0 

0 

0 

0 

0 

0 

[12] = 121314 

1 

0 

0 

0 

0 

0 

[13] = 13 2 4 

0 

1 

0 

0 

0 

0 

[14] = 14 2 3 

0 

0 

1 

0 

0 

0 

[23] = 1|23 4 

0 

0 

0 

1 

0 

0 

[24] = 1124 3 

0 

0 

0 

0 

1 

0 

[34] = 1|2134 

0 

0 

0 

0 

0 

1 

12134 

1 

0 

0 

0 

0 

1 

13 24 

0 

1 

0 

0 

1 

0 

14 23 

0 

0 

1 

1 

0 

0 

12314 

1 

1 

0 

1 

0 

0 

124 3 

1 

0 

1 

0 

1 

0 

134 2 

0 

1 

1 

0 

0 

1 

11234 

0 

0 

0 

1 

1 

1 

PT = 1234 

1 

1 

1 

1 

1 

1 


As for weights on edges {P, Q} G E of (covering) graph G, let F C M®” be 
the vector space of strictly order-preserving/inverting and symmetric partition 
functions / : —>■ R. As already mentioned entropy, rank and size are in 

F, and the former is order-inverting, while the latter two are order-preserving. 
Given any / G F, define weights rc/ : E —> IR++ on edges {P, Q} G E by 

Wf{{P,Q}) = max{/(P),/(Q)} -min{/(P),/(Q)}. 

For all pairs P,Q € , let Path{P,Q) contain all P — Q-paths in graph G, 

noting that this latter is highly connected (or dense), as every partition P is 
covered by partitions Q and covers — l) partitions Q', hence 

\Path{P,Q)\ » 1 for all P,Q. Recall that a path p{P,Q) G Path{P,Q) is a 
subgraph p{P, Q) = (Vp q, E^ q) C G where 

= {i" = /b,A,...,Pm = Q} and 

^P.Q ~ {{Po,Qo}, ^ {Pm-l,Qm-l}}, 
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with Pk+i = Qfe for 0 < fc < m. Also, the weight of a path p{P, Q) is 
Wf{p{P,Q))= ^ Wf{{Pk,Qk})- 

0<fc<m 

Definition 9 Minimum-f-weight partition distance 6f : Vn x > K.+ is 

df{P,Q) := min Wf{p{P,Q)) for all f g¥. (8) 

Proposition 10 For all f G ¥ and all P,Qg , every minimum-f-weight 
P — Q-path visits P AQ or P\/Q or both; that is to say, if path p{P,Q) satisfies 
Wf{p{P, Q)) = Sf{P, Q), then Vpg n {P A Q, P V Q} ^ 

Proof: If P,Q are comparable, say P Q, then {P V Q,P AQ} C VpQ 
for all paths p{P,Q) € Path{P,Q), in that P = P V Q and Q = P A Q; in 
particular, ii P > Q, then the unique minimum-/-weight P — Q path consists 
of vertices P and Q together with the edge {P, Q} € E linking them. On the 
other hand, if P, Q are not comparable, i.e. P ^ Q ^ P, then any path p{P, Q) 
visits some vertex P' comparable with both P, Q, and either P' > P,Q or else 
P,Q > P'■ Hence p(P,(5) = p(P, P')Up(P', Q), with Ap p,ni?p, g = 0, for some 
P — P'-path p(P, P') and some P' — Q-path p{P', Q), entailing that the weight 
of such a.p{P,Q) is Wf{p{P,Q)) = Wf{p{P,P')) -\- Wf{p{P',Q)). Finally, since 
/ is strictly order-preserving/inverting and symmetric, P' = P V Q minimizes 
Wf{p{P,P')) + Wf{p{P',Q)) over all partitions P' > P,Q while P' = P A Q 
minimizes Wf{p{P, P')) -\-Wf{p{P', Q)) over all P' < P,Q. ■ 

Whether a minimum-/-weight path visits the join or else the meet of any two 
incomparable partitions clearly depends on /. A generic / G F may have associ¬ 
ated minimum-weight paths visiting the meet of some incomparable partitions 
P, Q and the join of some others P', Q'. In fact, whether minimum-weight paths 
awlays visit the meet or else the join of any two incomparable partitions depends 
on whether / or else —/ is supermodular. As already observed, if / is super- 
modular, then —/ is submodular, i.e. —/(P AQ) — f{P y Q) < —f{P) — f{Q) 
(and viceversa). 

Proposition 11 For any strictly order-preserving / G F, if f is supermodular, 
then the minimum-f-weight partition distance is 

5f{P,Q) = f{P) + f{Q)-2f{PAQ), 

while if f is submodular, then the minimum-f-weight partition distance is 

SfiP,Q) = 2f{PVQ)-f{P)-f{Q). 

Proof: Supermodularity entails 

y{p V Q) - /(P) - fiQ) > f{p V g) - /(P A g) > /(P) + /(g) - 2/(p a g), 

whereas submodularity entails 

V{P V g) - /(P) - /(g) < /(P V g) - /(p a g) < /(p) + /(g) - 2 /(p a g), 

for all P, g G . ■ 
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Proposition 12 For any strictly order-inverting f G ¥, if f is supermodular, 
then the minimum-f-weight partition distance is 


Sf{P, Q) = f{P) + f{Q) - 2f{P V Q), 

while if f is submodular, then the minimum-f -weight partition distance is 
SfiP,Q) = 2f{PAQ)-f{P)-f{Q). 

Proof: Supermodularity entails 

V{P A Q) - f{P) - fiQ) > f{p A Q) - /(P V Q) > /(P) + /(Q) - 2/(P V Q), 
whereas submodularity entails 

2/(P A Q) - /(P) - f(Q) < f(P A Q) - /(P V g) < /(P) + f(Q) - 2/(P V Q), 
for all P,Q G . ■ 

Since the size s is supermodular (see Proposition 1) and order-preserving, 
FID is the minimum-s-weight partition distance, i.e. F[D{P,Q) = 5s{P,Q) for 
all P, Q. On the other hand, the rank r is submodular [31 pp. 259, 265, 274] and 
order-preserving, hence 5r.(P, Q) = 2r{P\/Q) — r{P)—r{Q) = |P|-|-|g| —2|Pvg| 
is the minimum-r-weight partition distance. In particular, Wr{{P,Q}) = 1 for 
all edges {P, Q} G E, and therefore Sr is in fact the shortest-path distance. 
This is detailed below by means of Example 2. Finally, entropy e is order- 
inverting and submodular, hence the minimum-e-weight distance Se is the VI 
distance VI{P, Q) = 2e{PAQ) — e(P) — e{Q), as shown in Example I hereafter. 
Propositions 9 and 10 are summarized in Table 3 below. 

Example 13 Entropy-based minimum-weight path distance: for any two 

atoms [ij], [ij'] G such that {i,j} O {i,j'} = {*}, the VI distance is 

[if]) = 2e([0] A [if]) - e{[ij]) - e{[if]) = 21ogn - 2 ^logn - 
and this is indeed the minimum-e-weight distance. On the other hand, 
e([u]) + e{[if]) - 2e([zj] V [if]) = 2 ^logn - 

= f (3logs - 2). In fact, ^ = VI{[ij], [ij']) < f (31og3 - 2) as 4 < 31og3. 

Example 14 Rank-based shortest path distance: let N = {1,2,3,4,5,6,71 
and consider partitions P — 135|27|46 and Q = 1|23|47|56 (with vertical bar [ 
separating blocks as in Table 2 above). Then, P A Q = 1|2|3|4|5|6|7 = P_l as 
well as P V Q = 1234567 = P^. Accordingly, 

Sr{P, Q) = 2r(P V Q) - r(P) - r{Q) = |P| + |Q| - 2|P V Q| = 3 + 4 - 2 = 5 

while r(P) -f r(Q) - 2r(P A Q) = 2|P A Q| - |P| - |Q| = 14 - 3 - 4 = 7 

as |P| -I- IQI — 2|Pvg| = 5 is the length of a shortest path between P and Q. 
Such a path visits P V Q = P^ and for instance may be across edges 

(P, 12357|46}, {12357|46, P^}, jP^, 123|4567}, {123|4567,1|23|4567} 


n 


— I — 2 ( log n -log 3 I = 
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and finally {1|23|4567, Q} o/Pjv (or equivalently of Hasse diagram G introduced 
above). On the other hand, a shortest P — Q-path forced to visit P A Q = P± 
has length 7 and for instance may be across edges 

{P, 1|35|27|46}, {1|35|27|46,1|2|35|46|7}, {1|2|35|46|7,1|2|3|46|5|7}, 

{1|2|3|46|5|7, Pj_}, {Pi, 1|23|4|5|6|7}, {1|23|4|5|6|7,1|23|47|5|6} 

and finally {1|23|47|5|6, Q}. Note that the rank assigns to every edge {P,Q} of 
Pat unit weight Wr{P,Q) = 1, and thus Sr is indeed the shortest path distance. 


Table 3: Sf{P,Q) for f symmetric, strictly order preserving/inverting, su¬ 
per /submodular. 


/ symmetric 

/ strictly order-preserving 

/ strictly order-inverting 

/ supermodular 

f{P)+f{Q)-2f{PAQ) 

/(P) + /(Q)-2/(PVQ) 

/ submodular 

2/(PVQ)-/(P)-/(Q) 

2/(PAQ)-/(P)-/(Q) 


6 Distinctions, co-atoms and fields 

A further measure of partition entropy, called logical entropy, has been recently 
proposed m in terms of distinctions, i.e. ordered pairs {i,j) € N x N (see 
Section 2). In statistical classification, the same concept is also referred to as the 
“Gini coefficient” pp. 53-54, 247-250, 257, 334]. If distinctions are replaced 
with unordered pairs {i,j} G N 2 , then mutatis mutandis the non-normalized 
logical entropy of partitions P is the analog of ( 2 ) — s{P), providing a further 
minimum-weight partition distance. Furthermore, since in information theory 
partitions are generally evaluated by means of order-inverting functions, the 
approach developed thus far may be applied to the upside-down Hasse diagram 
of the partition lattice, with co-atoms (or dual atoms sa) in place of atoms. In 
this way, the distance between partitions is the distance between the associated 
fields of subsets. 

A partition P distinguishes between i G N and j G N\i ii i G A G P while 
j G B G P with A B, and the set of such distinctions has been recently 
proposed as the logical analog of the complement of P, with the (normalized) 
number of distinctions providing a novel measure of the (logical) entropy of 
partitions [Til ITS] . In particular, this is achieved through apartness binary 
relations PA, wich are the complement of equivalence relations P (see again 
Section 2). In terms of atoms [ij] G of the partition lattice, the logical 
entropy h : -G R+ of partitions [TH p. 127] is 

wp. ^ 2\m:P^m\ ^ 2((")-s(P)) ^ n(n-l)-2s(P) ^ 

7j2 ,.j2 ’ 

with h(pT) = 0 = s(Pi) and h(P^) = ^ = 

Proposition 15 The logical entropy-based minimum-weight distance Sh is 
Sf,{P, Q) = 2h{P A Q) - h{P) - h{Q) for all P,Q G P". 
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Proof: Logical entropy h satisfies /i G F and is strictly order-inverting. Also, 
apart from constant terms, h varies with —s, which is submodular because s is 
supermodular. That is to say. 


h{P) + h{Q) = 
h{P ^Q) + h{P\JQ) = 


n — 1 — 


n — 1 — 


s{P) + s{Q) 


s(PAQ)+s(PVQ) 


Thus s(PAQ)-fs(PVQ) > s{P) + s{Q) h{PAQ) + h{PyQ) < h{P) + h{Q) 
and the desired conclusion follows from Proposition 10. ■ 

A field of subsets is a set system P C 2^ closed under union, intersection and 
complementation, i.e. A D P, A U P, G P for all A, P G P. Every partition 
P G generates the field Pp = 2^ containing all subsets B G 2^ obtained 
as the union of blocks A G P, with Ppj^ = 2^ and PpT = {0,A^}. There 
are 2"“^ — 1 minimal fields (generated by partitions) that strictly include Ppr; 
they are those Pa = Pa'= = {0) A, A"^, TV} with % C A G N . On the other hand, 
2-cardinal partitions {A, A°} G are the co-atoms [5] (or dual atoms [JS]) 
of partition lattice (P^,A,V) ordered by coarsening. In fact, in information 
theory finer partitions are generally more valuable than coarser ones, and thus 
attention is placed on order-inverting partition functions. In this view, the 
partition lattice is often dealt with as ordered by refinement and thus with the 
upside-down Hasse diagram. Accordingly, a distance between partitions also 
obtains by counting co-atoms rather than atoms. To this end, define the co-size 
cs : ^ by cs(P) = |{{A, A'^} : P < {A, A'^}}|, with cs(Pj_) = 2”“^ — 1 

and cs(P^) = 0. In words, cs(P) is the number of co-atoms coarser than P. 


Proposition 16 The minimum-cs-weight partition distance is 

5cs{P, Q) = cs{P) + cs{Q) — 2cs(P V Q) for all P,Q G . 

Proof: Denote by : P^ -A Z the Mobius inversion from above [31 of 
the co-size, with cs(P) = J2q^p P- By definition, fP’^{P) = I 

if |P| = 2 and 0 otherwise. Like for the size in Proposition 1, this entails 
supermodularity, i.e. cs(P A Q) + cs(P V Q) > cs{P) + cs{Q). Furthermore, 
cs G F is order-inverting. Therefore, 


cs{P) + cs{Q) — 2cs{P\/Q) < cs{PAQ) — cs{P\/ Q) < 2cs{PAQ) — cs{P) — cs{Q) 
for all P,Q G . ■ 

Denote by (S, fl, U) the lattice whose elements are the Bn fields of subsets 
Pp generated by partitions P G P^, ordered by inclusion A. The meet and 
join are, respectively, Pp □ Pq = PpvQ and Pp U Pg = PpaQ- The set of 
atoms is the collection {P{a,A‘'} : 0 C A C N} of minimal fields; that is to 
say, Pp = U P{a,A‘=} for all Pp G 3. Therefore, Scs{P,Q) nray also be 

regarded as an analog of the traditional Hamming distance between subsets: 


S,s{P, Q) = |{{A, A^l : P{a.a<=} C PpII + |{{A, A^j : P{a,a<=} C Pq}| + 

- 2|{{A,A^}:P{A,A»}C(PgnPp)}|. 

In words, this is the number of minimal fields P{a,A'=} included in either Pp or 
else in Pg, but not in both. 
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7 Appendix: Euclidean distance between fuzzy 
partitions 

The leading idea of this section is to propose a measure of the distance be¬ 
tween fuzzy partitions, like in [5]. Together with theoretical worthiness, from 
an applicative perspective this distance is useful for comparing alternative re¬ 
sults of objective function-based fuzzy clustering algorithms (such as the fuzzy 
C-means, see Eiiiia for a comprehensive treatment). More precisely, these 
algorithms usually rely on local search methods, and their output takes the 
form of a membership matrix, where rows and columns are indexed by data and 
clusters, respectively. For a given data set, the chosen algorithm typically out¬ 
puts different membership matrices depending on alternative initial candidate 
solutions and/or parametrizations, and these varying outputs are commonly 
ranked through a validity index (see [53] for a recent overview). A key input 
is the desired number of clusters, which is not chosen autonomously through 
optimization, but is conversely maintained fixed over the search. Conceiving 
several runs for each reasonable number of clusters, a common situation is thus 
one where alternative outputs score best on the chosen validity index. Then, 
the proposed distance measure allows to compare these outputs, each with a 
different number of clusters and with highest validity score for that number. 

Fuzzy clusterings are collections Ai,..., Am C TV of subsets of N endowed 
with n membership distributions a;^/, 1 < f < n, 1 < 1 < m, where Xu G [0,1] 
quantifies the membership of f G iV in A;, 1 < 1 < to. A fuzzy clustering 
thus is a TO-collection of fuzzy subsets {xu,. ■ ■ ,Xni) G [0,1]",1 < I < to of 
N [^, and to G {1,...,2” — 1} since every non-empty subset Ai ^ % may 
have an associated fuzzy subset {xn,. ■ ■ ,Xni) G [0,1]”. Membership matrices 
X G [0,satisfy J2i<i<m ^ i & N. The traditional Euclidean 

distance d{x,x') between x = (xi,. ■ ■, Xn), x' = (x(,..., a;(j) G [0,1]” simply 

is d{x,x') = Y^Ei<i<ra ~ i-®- the £2 norm in R". For measuring the 

same distance d(x, x') between x G [O,!]”^™ and x' G [0,1]”^”* it must be 
TO = to', in which case d(x,x') = ~ ^'uY- Yet, as already 

observed, very likely there are fuzzy clusterings with high scores in terms of the 
chosen validity index such that m ^ m'. In this view, the proposed method is 
dimension-free, i.e. regardless of whether to = to' or to ^ to'. 

When considering that the n singletons {i}, i € N are the atoms of Boolean 
lattice (2'^,n,U), a fuzzy subset is readily seen to consist, in fact, of n mem¬ 
berships G [0,1], 1 < f < n indexed by the n atoms. In this view, from 
a combinatorial perspective fuzzy elements of atomic lattices may be defined 
to be collections of [0,l]-memberships, one for each atom. Insofar as lattice 
theory is concerned, fuzzy partitions may thus be regarded as points in the 
0/1-polytope Pjv introduced in Section 5, with variables y[ij]^. indexed by atoms 

Definition 17 A fuzzy partition is any y = , ■ ■ ■ j y[ij]& ^n, and 

y = Ip G ex(Piv) is in fact non-fuzzy (or hard), while y G PAr\ea:(PAr) is 
properly fuzzy. 

A fuzzy partition thus is a point in the polytope Pjv C [0, l]^^) included in 
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the (”)-cube, with axes indexed by atoms \if\ S V^, and a non-fuzzy partition 
P G corresponds to a vertex of Pat identified by indicator function Ip. 
Denote by M^r = U [0,the set of all membership matrices. For 

X = [xii]i<i<n,i<i<m G Mat, let Ai,... ,Am be the associated subsets, i.e. xu is 
the membership of i in Ai, while ^ Thus, for instance, 

if TO = 1 , then xn = 1 for 1 < t < n. 

Proposition 18 Mapping 77 : Mat ^ [ 0 , l]^^) defined by 

VMix) = ^ {xii ■ Xji) for all [ij] G (10) 

l</<m 


satisfies: (i) if xu G {0,1} for alii,I, then ri{x) G exlV^), and (ii) r]{x) GPat- 


Proof: Firstly, X]i</<m ^ alH G entails that the summation yields 

a positive quantity never exceeding 1, that is ? 7 (x) G [0, l]^^). Concerning (i), if 
Xii G {0,1}, 1 < i < n, 1 < / < TO, then x corresponds to a non-fuzzy partition 

P G , i.e. Xu = I n ^ l<i<n, 1</<TO. The to columns 
} Q\ti^ Ai, 

{xii ,..., Xni)'^, 1 < I < TO of X are thus given by the to characteristic functions 
XAi ,1 < I < m oi subsets Ai,. .., Am (see above), with P = {Ax, ..., Am} for 
some partition P G . Hence ? 7 (x) = Ip, in that 


V[ij]{x) = Ip{[ij]) 


1 if {i,j} C Ai for some / G { 1 ,..., to}, 
0 if {i,j} 2 for ain G {1,..., to}. 


for all atoms [ij] G V^. Finally, coming to (ii), observe that ? 7 (x) obtains as 
a suitable convex combination of vertices Ip^,...,Ip,^ G ex (Pat) of the poly¬ 
tope. That is to say, p(x) = ap^Ip^ + • ■ • + ap,^Ip^ with ap^ ■ ■ ■, ap/^ >0 and 
J2i<h'<h^Ph' ~ f- These partitions Ph' and coefficients ap^,,l < h' < h 
are determined through a fairly simple recursive procedure. Starting from 

the top partition Pi = P^, with coefficient apT = min rj^ijfix), let [iffi 

[ih&Vjl 

be the atom corresponding to this minimum. Next, atom [ijY corresponds 

to minimum min rjuu (x) and P 2 < P^ is a coarsest partition satisfying 
[b1#[bT 

[ijfi ^ P 2 ^ while coefficient ap.^ = ? 7 [jj] 2 (x) — ? 7 [jj]i(x) obtains in¬ 

crementally. At the generic /I'-th step, atom [ij]^ corresponds to minimum 

min ? 7 rj,](x), while the selected partition Ph> is a coarsest one sat- 
[bVlbT.-.lbH'-i 

isfying [iffi,... ,[ij\^ ^ Ph ^ [ij]^ and the coefficient ap^, is given by 

ap^, = ? 7 jj^jh/(x) — (x). These steps continue through partitions that 

are either finer or else incomparable with respect to the previous ones, while 
reaching the atoms themselves and, if necessary, the bottom P± too. ■ 


Example 19 For N = {1,2, 3,4}, consider the collections 
{Ai,A 2 ,A 3 } = {{1,2,3},{1,4},{2,3,4}} and 

{A}, A 2 , A 3 , A 4 } = {{1,2}, {2, 3}, {3,4}, {1, 2, 3,4}} of subsets, with mem¬ 
bership matrices x G [0, and x! G [0, l]4x4 giy^n hy: 
xii = 0.7, X 12 = 0.3, Xi 3 = 0 and 
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X 21 = 0.4, X 22 = 0, X 23 = 0.6 and 

X 31 = 0.2, X 32 = 0, a ;33 = 0.8 and 

X 41 = 0, X 42 = 0.5, X 43 = 0.5 for the former, while 
x'^-^ = 0.4, x \2 = 0 = x'l^, x\^ = 0.6 and 
x' 2 \ = 0.2, x '242 = 0.3, 0^23 = 0, x '24 = 0.5 and 
x^-^ —— 0, x ^2 ~~ 0.3, 0^33 ~~ 0.4, x^,^ —— 0.3 and 

^41 = 0 = 0 : 42 , 0:43 = 0.8, 0:44 = 0.2 for the latter. Let 

for notational convenience. Expression (10) yields: 
yi 2 = 0.7-0.4 = 0.28, 
yi 3 = 0.7-0.2 = 0.14, 
yi 4 = 0.3-0.5 = 0.15, 

y23 = 0.4 ■ 0.2 + 0.6 • 0.8 = 0.08 + 0.48 = 0.56, 

2/24 = 0.6 ■ 0.5 = 0.3, 

2/34 = 0.8 ■ 0.5 = 0.4 for the former collection, and 
y[2 = 0.4 • 0.2 + 0.6 • 0.5 = 0.08 + 0.3 = 0.38, 
yia = 0.6-0.3 = 0.18, 
yi4 = 0.6-0.2 = 0.12, 

y '23 = 0.3 - 0.3 + 0.5 - 0.3 = 0.09 + 0.15 = 0.24, 
y^4 = 0.5-0.2 = 0.1, 

2/34 = 0.4 - 0.8 + 0.3 - 0.2 = 0.32 + 0.06 = 0.38 for the latter. Concerning 
the convex combinations corresponding to ri{x) = y and ri{xf) = y', for the 
former collection {{1, 2,3}, {1,4}, {2,3,4}}, since 2/13 < UijA < * < J < 4 , 
firstly Pi = and ap„ = apr = 0.14 = yi3, thus partitions P2,P3,... coming 
next satisfy P2,P3,... ^ [13]. As yu = 0.15 < y^j for [ij] 7 ^ [13], [14], a 
coarsest P [14] is P2 = 124j3. Hence ap^ = q;i24|3 = 0.15 — 0.14 = 0.01 and 
therefore P [13], [14] for all subsequest partitions P. The new minimum is 
yi2 = 0.28, and the above constraints yield P3 = 12j34 as the coarsest available 
partition, with ai2|34 = 0.28 — 0.15 = 0.13. After updating, 2/34 = 0.4 is the 
novel minimum, with P4 = lj234 and q;i|234 = 0.4 — 0.13 — 0.14 = 0.13. The 
last partitions P5,Pe are atoms themselves, namely P5 = [24] and Pq = [23], 
with associated coefficients ai|24|3 = 0.3 — 0.13 — 0.01 — 0.14 = 0.02 as well as 
Q^i|23|4 = 0.56 — 0.13 — 0.14= 0.29. Since the sum of these six coefficients yields 
0.72, the bottom partition finally has coefficient ap^^ = 1 — 0.72 = 0.28. Thus 
the sought convex combination of indicator functions or vertices Ip € ex(Pjv) is 

y = 0.14 - /1234 + 0.01 - /i 24|3 + 0.13 - /i 2|34 + +0.13 - /i |234 + 0.02 - /i| 24|3 + 
+ 0.29 -/i|23|4 + 0.28-/i|2|3|4- 


+ generic point in polytope Pat generally admits alternative (equivalent) convex 
combinations of vertices. For instance, y also admits 

y = 0.15 - /i 4 | 2|3 + 0.14 - /i 23|4 + 0.14 - /i 2 | 3|4 + +0.3 - /i |234 + 0.1 - 7 i|2|34 + 
+ 0.12 - /i|23|4 + 0.05 -/i|2|3|4- 


Coming to the second collection {{1, 2}, {2, 3}, {3,4}, {1, 2,3,4}} of subsets, the 
first coefficient is afir = 2/24 = 0.1 since 2/24 < 2/^4 j 1 ^ ^ 4. Next, rather 

straightforwardly, 

^'[ i2 \ ~ '^i2|3|4 “ 2^12 ~ 2/24 ~ 0.38 — 0.1 = 0.28, 

*^[13] *^i3|2|4 2/13 2/24 0.18 0.1 0.08, 

^[14] ^i4|2|3 2/14 2/24 0.12 0.1 0.02, 
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'^[23] “ “l|23|4 “ 2^23 2/24 — 0.24 0.1 —0.14, 

ajg^j = Q!'j^| 2|34 = 2/34 — j /24 = 0.38 — 0.1 = 0.28. These six coefficients add up 
to 0.9, hence the bottom partition has coefficient a'p^ = 1 — 0.9 = 0.1. A sought 
convex combination thus is 

y' = 0.1 • /i 234 + 0.02 • /i 4 | 2|3 + 0.08 • /i 3 | 2|4 + 0.14 • /i| 23|4 + 0.28 ■ /i 2 | 3|4 + 

+ 0.28 • /i|2|34 + 0.1 • /i|2|3|4- 

The Euclidean distances between fuzzy partitions y, y' € Pat given by the 
£i and £2 norms, denoted by di{y,y') and d 2 {y,y') respectively, are the usual 
distances between points in a Euclidean vector space (i.e. 18 .( 2 ))^ namely 

di{y,y')= (y[y]-y[y]) and d 2 (y,y') = {vM - Vm) ^ 

[b]67’^ \ [b]67’l 

where abs{a — jd) = max{a, /?} — min{a, /3} is the absolute value. Both are well- 
known metrics (see above). In particular, triangle inequality may be considered 
in conjunction with the order relation and the meet of fuzzy partitions. 

7.1 Order, meet and join 

The order relation the meet A and the join V for partitions P,Q€ may 
be extended from vertices Ip,Iq of polytope Pjv to the whole of this latter. 
Specifically, P ^ Q > 1q([*j]) for all [ij\ e In the same way, 

for any two fuzzy partitions y, y' € Pat, 

y>y' yiij] > for all [ij] e P^. 

For the discrete setting provided by vertices of the polytope, the following con¬ 
dition has been already considered in terms of “vertical collinearity” [31] or 
“interval betweenness” (for valuations of distributive lattices) [38]. 

Proposition 20 For any fuzzy partitions y,z,y' S Pat, if y ^ z ^ y', then 
di{y, z),di{z,y') and di{y,y') satisfy triangle inequality with equality, that is 

di{y,z) + di{z,y') = di{y,y'). 

Proof: If y ^ z ^ y', then for all atoms [ij] G P^ 

abs {y[ij] — Z[ij ]) = y[ij] — Z[ij ], 

abs - y'l^j]) = ffij]-yM, 

a 6 s(yM-y[y]) = yM-y[^J], 

and of course 

y[ij] — ffij] + ffij] — y[ij] = y[ij] — y[ij ]; 

hence di{y,z) + di{z,y') = di{y,y'). ■ 

The same does not hold for d 2 , which conversely satisfies triangle inequality 
with equality if and only if z lies on the line segment between y and y'. 
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Turning attention to the meet y Ay' of fuzzy partitions y,y' € Pat, firstly 
consider that for the characteristic functions xa,Xb, XAnB, A, B G 2^ oi subsets 
the meet or intersection is given by XAnBii) = XA{i)XBii) for all i G N, i.e. 
by the product. Analogously, the meet of partitions P,Q, P A Q G , with 
indicator functions Ip, Iq, Ip^Q G ex (Pat), is given by 

IpAQ{[ij]) = Ip{[ij])lQ{[ij]) for all [ij] G 

as P A Q = V [ij]. Then, the meet y Ay' oi fuzzy partitions y,y' G Pat 

also obtains through the product: [y A y')[ij] = for all [ij] G 

Proposition 21 For all y,y' G Piv, 

di{y,y Ay') + diiy Ay' ,y') - di{y,y') = 2 ^ (^min | . 

IbleP" 

Proof: Firstly note that y[ij],y'[ij] G [0,1] entails j/[y] > y[ij]y'[ij] < 

Now, di{y,y Ay') +di{y Ay', y') - di{y,y') = 

= H [ym - ymy'M + vm - y^y^ - [yl^J] - j/',,]) = 

[b]e-p" 

- (max } - min }) = 

= 2 Efe]eT>« (min {l/fo], as wanted. ■ 

A similar expression may be provided for the squared Euclidean distance or 
el norm dl [y, y Ay') + dl {y A y', y') - dl {y, y'). 

Proposition 22 dl (y, yAy')+ dl (y A y', y') - dl {y, y') = 

= 2 XI yMy'in] (i “ yiv] - y'ly] + yMy'in]) y^ y' e Pw- 

Proof: By direct substitution: dl {y, y Ay') + dl {y A y') — dl [y, y') = 

= X {vM~VMy[i3^ + (y[b]l/[ij] ~{yM-V[i3^ = 

[b]e-p" 

M&Pl 

= 2Efo]eT^« [ymv'M (l - vm - v'm + ymy[^J^] as wanted. ■ 

The join V of two (fuzzy) partitions leads to a more complex setting, be¬ 
cause it brings about the closure yielding the partition lattice as the polygon 
matroid defined on the edges of the complete graph = (N, 7 V 2 ) (see Sec¬ 
tions 1, 2). As already observed, for P,Q G , the meet P AQ = [ij] 

PMiMQ 
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is coarser than all and only those atoms [ij] S finer than both P and Q. 
Thus, the meet of partitions is basically the analog of the intersection of sub¬ 
sets A,B G 2^, and indeed in the same way obtains through the pair-wise 
product of indicator functions Ip,Iq (see above). Conversely, when regarded 
as a pair-wise operation between indicator functions Ip,Iq, the join is very 
different from the union of subsets. In particular, recall that for A^B G 2^, 
with characteristic functions XA, Xb S {0,1}", the union AU B obtains as fol¬ 
lows: XAuBii) = inax{y^(z), for all i G N. In words. Boolean vector 

XAuB G {0,1}” has entry I where xa and/or xb have entry 1. The same 
does not apply to partitions P,Q G , as indicator function /pvQ € (0, 1 }( 2 ) 
may have entry 1 even where both Ip and Iq have entry 0. As before, this 
can be observed already in the simple case where A^ = {1,2,3}. To this end, 

Jp([12]) 

arrange the entries of Ip,P = Pj_, [12], [13], [23], Pby Ip = 


hence /[ 12 ] = 


but /[12]V[23] — 



and /[ 23 ] = 



Jp([13]) 
Ip ([23]) 
1 


Then, max {/[ 12 ],/[23]} = 


, i.e. I[i2]v[ 23] ^ max {/[12], .f[23]}- 


Definition 23 In terms of indicator functions Ip,Iq,Ip\jq G {0,1}^^), for all 
atoms [ij] G the join P V Q of partitions P,Q G is 


^pvq([*/]) = max<^/p([ij]),/Q([ij]), max Ip{\ii'])lQ{\ji'])\ . 


the same way, the join (j/V 2/0 [ij] of fuzzy partitions y,y' G Pat is given 


In 


by ( 2 / V 2/')fo] = max <j //fe], 2/|,,], .,^max 


8 Appendix: the consensus partition problem 

Hamming distance between partitions HD was considered for the first time in the 
mid ’60s m in terms of the consensus (or central) partition problem, which is 
important in many applicative scenarios concerned with statistical classification. 
From a combinatorial optimization perspective, the problem has generic instance 
consisting of a m-collection Pi,... ,Pm € P^, m > 2, and is characterized by 
firstly selecting a measure of the distance between any two partitions, i.e. a 
metric <5 : x -A R+. Given this, the objective is to find a partition P 

minimizing the sum of its distances from the m partitions. That is to say, any P 
satisfying I]i</c<m Prn) < Pk) for all Q G is a consensus 

partition. For generic S, finding a solution P is tipically hard. In particular, 
if (5 = MMD, then each distance 6{Q,Pk),l < fc < m for any Q G is 
computable in 0(n?) time [25l p. 236], whereas if d = HD, then in view of 
expression (6) above (see Section 3) each distance S{Q, Pk) is computable more 
rapidly through scalar products. In any case, independently from the chosen 
metric 5, the main issue is that the size = |P^] of the search space 
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makes all approaches relying on direct enumeration simply unviable, at least 
for relevant values of n. The problem is thus commonly interpreted in terms of 
heuristics [BISS], and if m is large and/or Pi,, Pm are very far from each 
other, then figuring out where to concentrate the search is the fundametal issue. 

Although the consensus problem is generally harsh, especially in terms of 
the required exploration of P^, still the analysis conducted thus far identifies 
conditions where exact solutions are easy to find. In fact, if the chosen metric is 
a minimum-/-weight partition distance, i.e. 6 = 6f with / S F, and weighting 
function / is either supermodular or else submodular (but not both, see below), 
then either the meet P = Pi A • ■ • A Pm or else the join P = Pi V • ■ • V Pm 
of instance elements are consensus partitions. Specifically, the former case ap¬ 
plies to Hamming distance or size-based 5s = HD and to logical entropy-based 
5h, while the latter applies to rank-based 5r and to co-size-based ^cs- Hence, 
the computational burden reduces solely to assessing the m distances between 
instance elements and their meet (or else their join), with no search need. 

Proposition 24 If distances between partitions are measured by HD, then the 
meet of all instance elements achieves consensus, i.e. 

Y, HD{PiA---APm,Pk)< Y HD{Q,Pk) 

for all Q € P^ and all instances I = {Pi,..., Pm} Q P^■ 

Proof: Firstly note that for m = 2 this consensus condition is in fact a re¬ 
statement of horizontal collinearity and triangle inequality (see Propositions 1 
and 2). Hence, in order to use induction, assume that the condition holds for 
some m > 2, and denote by P the solution or consensus partition of a m -I- 1- 
instance Pi..., Pm, Pm+i- By assumption, Pi A - ■ - APm is a solution of instance 
Pi,..., Pm, thus novel solution P minimizes the sum of its distances from the 
previous solution Pi A ■ • ■ A Pm and from the novel instance element Pm-i-i, i.e. 

PP(Pl A • • • A Pm, P) + HD{P, Pm+l) < HD{Pi A---APm,Q)+ HD{Q, Pm+l) 


for all Q € P^ . Then, horizontal collinearity and triangle inequality entail 

HD{Pi A ■ ■ ■ A Pm, P) + HD{P, Pm+l) > HD{Pi A ■ ■ ■ A Pm, Pm+l), 

with equality if P = Pi A • • • A Pm A Pm+i ■ ■ 

Concerning the value taken by the sum X]i<fe<m HD{Pk,Pi A • • • A Pm) of 
distances between instance elements and the consensus partition, observe that 
for all Q e P^ and all I = (Pi,..., Pm} 

Y HD{Q,P,)= Y + 

TO — 1 

l<fe<m l<k<k' <771 

By triangle inequality, 

HD{Pk,Q) + HD{Q,Pk>) ^ y. HD{Pk,Pk>) 

^ TO — 1 “ ^ TO — 1 ’ 

l<A;<A;'<m l<k<k' <7n 
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with equality if Q = Pfc A Pk' for all 1 < fc < fc' < m, which is not possible 
unless m = 2. Now consider partition function 2?x : 11R+ defined by 

Vi{Q) = V [HD{Pk,Q) + HD{Q,Pk>)-HD{Pk,Pk’)] 

m — 1 

l<k<k' <m 

= - - ^ [s{Q) — s{Pk f\Q) — s{Pk'/\Q)-\-s{Pk f\Pk')]-> 

where X = {Pi,..., Pm} denotes the given instance. Function T>i attains its 
minimum at consensus partition Pj := Pi A • • • A Pm, where 

Pi{Pi) = ^ ^ [s{Pk Pk') — s{Pi)] 

l<k<k' <m 

as HD{Pk,Px) + HD{Pi,Pk') = HD{Pk,Pk') + 2[s(Pfc A Pk') - s{Pi)] for all 
1 < k < k' < m. 

Exactly the same argument applies to logical entropy-based 5h, entailing 
that Yl,p^i^h{P,Pi) < J2pex^h{P,Q) for all Q G and all instances X. 

For rank-based 6r and co-size-based Scs distances, horizontal collinearity 
holds in terms of the join (rather than in terms of the meet of any P,Q € P^, 
see above), meaning that 5 € {(5r,<5cs} yields 

S(P, P') + 5(P', Q) < SiP, Q) for all P, P', Q G P^, 

with equality if P' = P V Q. Thus the join (rather than the meet) of instance 
elements achieves consesus, i.e. VpgiP) < X^Pei*5 ) fo'' 

Q G P^ and all instances X, while analog results apply, mutatis mutandis, to 
partition function Px- 

The setting developed thus far also enables to frame the consensus partition 
problem in a novel manner, which in turn widens the spectrum of conceivable 
fuzzy models for partitions. In order to briefly outline such new possibilities, 
firstly recall that a fuzzy subset of iV is a function q : TV ^ [0,1] or, from 
an equivalent geometric perspective, a point q = (qi,...,q„) G [0,1]" in the 
n-dimensional unit hypercube, where qt = q{i), i G N. Accordingly, a fuzzy 
partition is commonly intended as a partition P = {Ai ,..., A|p|} with associ¬ 
ated jPj points q^ G [0,1]", A G P in the hypercube such that qf G (0,1] for 
all i G A and all A G P. On the other hand, a fuzzy graph with vertex set 
N may be seen as one whose edge set is a fuzzy subset of N 2 , i.e. a function 
t : N 2 -G [0,1] or, from an equivalent geometric perspective, a point in the 

( 2 )-dimensional unit hypercube, i.e. t = € [0, 1 ]( 2 ). 

By looking at partitions of N as graphs with vertex set N each of whose 
components is complete, fuzzy partitions can be regarded as fuzzy graphs with 
complete components. Along this route, the fuzzy consensus partition tx associ¬ 
ated with instance X C may be defined to be the point in the interior of the 
polytope P of partitions (see above) corresponding to the center of the convex 
hull conv{{Ip : P G X}) given by all convex combinations of the indicator func¬ 
tions Ip,P GX oi instance elements. In this way, the fuzzy consensus partition 
is a function ranging in the unit interval [0,1] and taking values on the atoms 
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of 7^^, i.e. tx ■ [0,1]. In particular, 


tli[ij]) = XI [ij] e vlly 


Pel 


In this framework, the strong patterns of instance I considered in [39] are 
the blocks of partition P{tx) obtained through defuzzification of tx as follows: 

Piti)=. , V [ij]. 


In words, P{tx) obtains as the join of all atoms where the fuzzy consensus 
partition attains its maximum, i.e. 1. 


9 Conclusion 

This work considers distances between partitions by focusing on lattice theory 
and relying on discrete methods. Specifically, it firstly develops from the idea 
of reproducing the traditional Hamming distance between subsets by counting 
unordered pairs of partitioned elements or atoms of the partition lattice. Al¬ 
though counting ordered and/or unordered pairs is not new (see [31] Section 
2.1] for a survey), still the Hamming distance between partitions HD is here 
analyzed from a novel geometric perspective. Special attention is placed on 
the distance between complements in comparison with two alternative partition 
distance measures proposed in recent years, namely MMD and VI. Given its 
low computational complexity combined with fine measurement sensitivity, HD 
may be considered as an alternative to MMD and VI for applications. 

Like the cardinality of the symmetric difference between subsets is a count of 
atoms of a Boolean lattice, in the same way HD relies on the size, which counts 
the atoms finer than partitions, but while the cardinality or rank of subsets is 
a valuation, i.e. both supermodular and submodular, the size of partitions is 
supermodular, in that valuations of the partition lattice are constant partition 
functions |5]. Also, in view of expression |A U i?| — |A n i?| for the Hamming 
distance between subsets A, H, it may seem reasonable to consider distances 
between partitions P, Q of the form f{P V Q) — /(P A Q) for some symmetric 
and order preserving/inverting /, i.e. / S F. However, such a distance takes 
the same value /(P^) —/(Pi.) whenever P and Q are complementary partitions 
(see Section 4), and this should be avoided in view of [321 Theorem 1]. 

The geometric approach adopted here enables to analyze further partition 
distances obtained by replacing the size with alternative partition functions 
such as entropy, rank, logical entropy and co-size. In general, any symmetric 
and order-preserving/inverting partition function / provides a distance between 
partitions P, Q by considering the four values /(P), f{Q),f{P^Q) and f{PVQ). 
Specifically, / defines weights on edges of the Hasse diagram (or 0/1-polytope) of 
partitions such that the so-called minimum-/-weight distance between any P, Q 
is the weight of a lightest P —Q-path. Depending on whether / is supermodular 
or else submodular and order-preserving or else order-inverting, a minimum-/- 
weight path between P and Q visits their meet or else their join, and viceversa. 
These four possibilities are summarized in Table 3, Section 5. In particular, 
HD is the minimum-s-weight distance Sg, where partition function s is the size, 
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while VI is the minimum-e-weight distance Se, where partition function e is the 
entropy. 

Any distance is of course normalized when considered as the ratio to its 
maximum value dmax- On the other hand, it may be relevant to consider such a 
maximum as a function d^nxin) of the number n of partitioned elements, with 
focus on the first-order difference 'Ddmaxin) = dmaxin + l) — (imax(fi) and on the 
second-order one ^?^dmax(^^) = d^a^in + 2) - 2dma^{n + 1) + dmax{n). For HD 
both differences are strictly positive: VHDynax{n) = n and 'D^HD^ax{n) = 1, 
and these are exactly the same values 'D\AAB\,'D‘^\AAB\ as for the traditional 
Hamming distance |AAi?| between subsets A, B. For entropy-based distance 
VI, the former VVImaxin) = log(n -|- I) — log(n) is positive while the latter 
'D'^VIynax{n) = log(n -I- 2) — 21og(n -I- 1) -I- log(n) is negative by concavity of 
the log function. For maximum matching distance T>MMBynaxin) = 1 while 
'D'^ MMDyaay.{n) = 0, and the same applies to rank-based minimum-weight dis¬ 
tance Sr outlined in Example 2, Section 5. For logical entropy-based minimum- 
weight distance VSh-maxin) = and V‘^Sh-max{n) = ■ 

By extending attention from edges and vertices of the 0/1-polytope of parti¬ 
tions to the whole of this latter, the general aproach based on atoms also applies 
to the fuzzufication of partitions. In particular, fuzzy clusterings or member¬ 
ship matrices of any dimension are turned into fuzzy subsets of atoms of the 
partition lattice, and thus distances between such matrices may be computed 
through common Euclidean norm in 
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